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Abstract — A coding scheme for write once memory (WOM) 
using polar codes is presented. It is shown that the scheme 
achieves the capacity region of noiseless WOMs when an ar- 
bitrary number of multiple writes is permitted. The encoding 
and decoding complexities scale as 0(N log N) where N is 
the blocklength. For N sufficiently large, the error probability 
decreases sub-exponentially in N. Some simulation results with 
finite length codes are presented. 

I. Introduction 

The model of a write once memory (WOM) was proposed 
by Rivest and Shamir in JTJ. In write once memories writing 
may be irreversible in the sense that once a memory cell is in 
some state it cannot easily convert to a preceding state. Flash 
memory is an important example since in regular operation 
the charge level of each memory cell can only increase. It 
is possible to erase together a complete block of cells which 
comprises a large number of cells, but this is a costly operation 
and it reduces the life cycle of the device. 

Consider a binary write-once memory (WOM) with N 
memory cells and t writes. Denote the number of possible 
messages in the l-th write by Mi (1 < I < i). The number of 
bits that are written in the Z-th write is fe/ = log Mi and the 
corresponding code rate is Ri = ki/N. Let s; denote the N 
dimensional state vector of the WOM at time (generation) I 
for < I < t, and suppose that Sq = 0. For I = 1,2, ... ,t, 
the binary message vector is a; (NRi bits). Given a; and the 
memory state S;_i, the encoder computes s; = E;(s;_i,aj) 
using an encoding function E; and writes the result s; on the 
WOM. Note that s; > s;_i where the vector inequality applies 
componentwise. The decoder uses a decoding function D; to 
compute the decoded message a; = Dj(sj). The goal is to 
design a low complexity read-write scheme that satisfies the 
WOM constraints and achieves a; = a; for I = 1,2, ... ,t with 
high probability for any set of t messages a;, I = 1,2, ... ,t. 
As is commonly assumed in the literature (see e.g. where 
it is explained why this assumption does not affect the WOM 
rate), we also assume that the generation number on each write 
and read is known. 

The capacity region of the WOM is 

C t = {(R 1 ,...,R t ) eR^ \Ri < ai-xh(ei), I = 1,2,...,* 
where = e < e 1; e 2 , ■ ■ ■ , et-i < et = 1/2} 

(1) 



where 



(Rl denotes a i-dimensional vector with positive elements; 
h(x) = —x log 2 x — (1 — x) log 2 (l — x) is the binary entropy 
function). We also define the maximum average rate, Ct, as the 
maximum of (Y^j—i Rj)/t over {Ru ■ ■ ■ i-^t) S Ct- The max- 
imum average rate was shown to be Ct = log 2 (£ + l)/t- 
This means that the total number of bits that can be stored 
on N WOM cells in t writes is N\og 2 (t + 1) which is 
significantly higher than N. WOM codes were proposed in 
the past by various authors, e.g. 0], H> E)> 0, and 
references therein. In this work we propose a new family 
of WOM codes based on polar codes (7j. The method relies 
on the fact that polar codes are asymptotically optimal for 
lossy source coding [8] and can be encoded and decoded effi- 
ciently (O(AnogTV) operations where N is the blocklength). 
We show that our method achieves the capacity region of 
noiseless WOMs when an arbitrary number of multiple writes 
is permitted. The encoding and decoding complexities scale 
as 0(N logiV). For N sufficiently large, the error probability 
is at most 2~ N for any < j3 < 1/2. We also design actual 
codes and present their performances. 

II. Background on Polar codes 

In his seminal work Q, Arikan has introduced Polar codes 
for channel coding and showed that they can achieve the 
symmetric capacity (i.e. the capacity under uniform input 
distribution) of an arbitrary binary-input channel. In it 
was shown that the results can be generalized to arbitrary 
discrete memory channels. We will follow the notation in 0. 

Let G2 = ( J ^ "\ and let its n-th Kronecker product be 

Gf n . Also denote N = 2™. Let u be an iV-dimensional binary 
{0, 1} message vector, and let x = uGf n where the matrix 
multiplication is over GF(2). Suppose that we transmit x over 
a memoryless binary-input channel with transition probability 
W(y I x) and channel output vector y. If u is chosen at 
random with uniform probability then the resulting probability 
distribution P(u, x, y) is given by 

1 

P(u, x, y) = — l {x=uG? „ } JJ W{ Vi I Xl ) (3) 



Define the following N sub-channels, 



oil 



3=0 



(2) wf^u*- 1 1 «,.) = pm- 1 1 Ui ) = p & 1 u ) 



Denote by I(W) the symmetric capacity of the channel W 
(it is the channel capacity when the channel is memoryless 
binary-input output symmetric (MBIOS)) and by Z(W$) the 
Bhattacharyya parameter of the sub-channels wjfi . In (7), iflOl 
it was shown that asymptotically in N, a fraction I(W) of the 
sub-channels satisfy Z(W$) < 2" 2 "' 3 for any < < 1/2. 
Based on this result the following communication scheme was 
proposed. Let R be the code rate. Denote by F the set of 
N(l — R) sub-channels with the highest values of Z(wj^) 
(denoted in the sequel as the frozen set), and by F c the 
remaining N-R sub-channels. Fix the input to the sub-channels 
in F to some arbitrary frozen vector up (known both to the 
encoder and to the decoder) and use the channels in F c to 
transmit information. The encoder then transmits x = uGf" 
over the channel. The decoder applies the following successive 
cancelation (SC) scheme for i = 0, 1, 2, . . . , N — 1. Denote 
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If i € F then Uj = Ui (ujr is common knowledge). Otherwise, 



(i) 

if L N > 1 then m 



(i) 

0, and if L N < 1 then Ui 



1. 



Asymptotically, reliable communication is possible for any 
R < I(W), and the SC decoder can be implemented in 
complexity 0(N log N). 

Polar codes can also be used for lossy source coding 0. 
Consider a binary symmetric source (BSS), i.e. a random 
binary vector Y uniformly distributed over all TV-dimensional 
binary vectors. Let d(x, y) be a distance measure between two 
binary vectors, x and y, such that d(x, y) = J2^ = id(xi,yi) 
where d(0, 0) = d(l, 1) = and d(0, 1) = d(l, 0) = 1. Define 
a binary symmetric channel (BSC) W(y \ x) with crossover 
parameter D and construct a polar code with frozen set F that 
consists of the (1 — R)-N sub-channels with the largest values 
of Z{W$). This code uses some arbitrary frozen vector up 
which is known both to the encoder and to the decoder (e.g. 
u F = 0) and has rate R = \F C \/N. Given Y = y the SC 
encoder applies the following scheme. For i = 0, 1, . . . , N— 1, 
if i S F then ui = m, otherwise 



w.p. LW/(L# + 1) 

1 w.p. + 



(4) 



(w.p. denotes with probability) The complexity of this scheme 
is 0{N log N). Since u^? = up is common knowledge, 
the decoder only needs to obtain upc from the encoder 
{\F C \ bits). It can then reconstruct the approximating source 
codeword x using x = uGf™. Let Ed(X(Y), Y)/N be 
the average distortion of this polar code (the averaging is 
over both the source vector, Y, and over the approximating 
source codeword, X(Y), which is determined at random from 
Y). Also denote by R{D) = 1 - h(D) the rate distortion 
function. In J8) it was shown, given any < D < 1/2, 
< 5 < 1 - R(D) and < fi < 1/2, that for N (i.e., n) 
sufficiently large, R = \F C \/N = R(D) + 5, and any frozen 
vector up, the polar code with rate R under SC encoding 



satisfies 



Ed(X(Y),Y)/JV < D + 0(2 



(5) 



In fact, as noted in J8), the proof of (0 is not restricted to 
a BSS and extends to general sources, e.g. a binary erasure 
source [8|. 

III. Extended results for Polar source codes 

Although the result in J8) is concerned only with the average 
distortion, one may combine (0 with the strong converse result 
of the rate distortion theorem in ifTTl p. 127] to conclude 
that |d(X(Y), Y) /N — D\ can be made arbitrarily small with 
probability that approaches 1 as n increases. We now extend 
this result. The following discussion is valid for an arbitrary 
discrete MBIOS, W(y \ x), in (0. As in [§D we construct a 
source polar code with frozen set defined by, 



F 



G {0, 



: Z[W$) >l-2<^} (6) 



(note that F depends on N, however for simplicity our 
notation does not show this dependence explicitly) and 



5 N = 2 



/(2N) 



(7) 



By [8] Theorem 19 and Equation (22)] (see also JS Equation 
(12)]), 

lim \F\/N = 1 - I(W) 

Hence, for any e > 0, if N is large enough then the rate R of 
the code satisfies, 

R= 1- \F\/N < I(W) + e 

Let y be a source vector produced by a sequence of 
independent identically distributed (i.i.d.) realizations of Y. 
If is chosen at random with uniform probability then the 
vector u produced by the SC encoder (that utilizes (0]i) has a 
conditional probability distribution given by J8) 



JV-l 



Q(u|y)= [] Q(ui\ui-\y) 



where 



Q(u.i 



u, 



i-l 



,y) = 



i=0 



1/2 



if i e F 



P{ui\vfr\y) i^eF c 



(8) 



(9) 



On the other hand, the conditional probability of u given y 
corresponding to (01 is, 



JV-l 



P(u|y) 



L ,y) 



i=0 



In the sequel we employ standard strong typicality argu- 
ments. Similarly to the notation in lfl2l Section 10.6, pp. 325- 
326], we define e-strongly typical sequences x, y 6 X N x 3^ 
with respect to a distribution p(x, y) on X x y, and denote 
it by At {N) (X,Y) (or A* {N) for short), as follows. Let 
C(a, b | x, y) denote the number of occurrences of the symbols 
a, b in x, y. Then x,y e A* e {N) {X,Y) if the following two 
conditions hold. First, for all a, b <E X x y with p(a, b) > 0, 



|C(a, b\x,y)/N- p{a, b)\ < e. Second, for all a, b e X x y 
with p(a, b) = 0, C(a, 6 | x, y) = 0. 



where 



In our case x 



x(u) 



uGf n . 



Note that G?" is a 



full rank matrix. Therefore each vector u corresponds to 
exactly one vector x. We say that u,y G At (U,Y) 
if x(u),y S Ae (X, Y) with respect to the probability 
distribution p(x,y) = W(y \ x)/2 (see (01). 

Theorem 1: Let the source vector random variable Y be 
created by a sequence of N i.i.d. realizations of Y. Consider 
a polar code for source coding |8) with block length N = 2", 
and let U be the random variable denoting the output of the SC 
encoder. Then for any < /3 < 1/2, e>0 and n sufficiently 
large, U, Y £ At (N) (U,Y) w.p. at least 1 - 2~^. 

Recall that the SC encoder's output u has conditional 
probability distribution Q(u | y) given by (ODl-©. Hence, for 



large n, Theorem Q] asserts, Q (At^ N \u, Y) 
Proof: By |8, Lemma 5 and Lemma 7], 



> 1 



-AT* 3 



E 

< 



Q(u,y)- ^ P(u,y) 

Q(u,y)-P(u,y)| <2\F\6 N 



(10) 



E 



In addition, 

p( A *{N) 



1-P[3a,b : 



—C(a,b\ X(U), Y) 



p(a,6) 



> e 



where we have used the fact that p(a, b) — implies 
G(a,6 | X(U),Y) = 0. Let Z be a binary {0,1} random 
variable such that Z t - = 1 if (^(U), Ij) = (a, b) and Z ( = 
otherwise. Then, 



P (Zi = 1) = p(a,b) , C(a, b | X(U),Y) 



E 

i=l 



Applying Hoeffding's inequality and the union bound it fol- 
lows that P \ A% } > 1 — e~ Nl for some constant 7 (that 
can depend on e). Combining this with ( TTOb we get 

Q(A*W) > 1 - e~ N i - 2\F\5 N 

Recalling the definition of 5n, 0. the theorem follows. □ 
Although not needed in this paper, it can now be shown that 
for n sufficiently large, d(X(Y), Y)/N < D + 6 w.p. at least 

1 - 2" 



-N 



IV. The proposed polar WOM code 

Given some set of parameters < ei, €2, ■ ■ ■ , et-i < 1/2, 
eo = and et = 1/2, we first consider the following t test 
channels. The input set of each channel is {0, 1}. The output 
set is {(0,0), (0,1), (1,0), (1,1)}. Denote the input random 
variable by X and the output by (S, V). The probability 
transition function of the Z-th channel is defined by, 



f(s,b) 



ai-i(l-ei) if s = 0,6 = 

ai—iei if s = 0, b = 1 

(l-ai-{) if 8 = 1,6 = 

if s = 1,6=1 



(12) 



and where ai is defined in ([2j. This channel is also shown in 
Figure Q] It is easy to verify that the capacity of this channel is 
1 — ai-ih(ei) and that the capacity achieving input distribution 
is symmetric, i.e., P(X = 0) = P(X = 1) = 1/2. For each 



(1,0) 




(0,0) 



(0,1) 



P. 10 



■« (1,1) 

Fig. 1. The probability transition function of the i-th channel 

channel, I, we design a polar code with blocklength N and 
frozen set of sub-channels Fi defined by (|6]). The rate is 



R'l = 1 -aj_ift(ei) +Si 



(13) 



P l ({S,V) = (s,v)\X = x)=f(s,x®v) 



(11) 



where Si > is arbitrarily small for N sufficiently large. This 
code will be used as a source code. The relation between Ri 
and R[ is 

Rl = 1 — R l 

Now we define E;(s,a) and D;(s) as follows. 
Encoding function, s = E;(s, a): 

1) Let v = s ® g where © denotes bitwise XOR and g is 
a sample from an N dimensional uniformly distributed 
random binary {0, 1} vector. The vector g is a common 
randomness source (dither), known both to the encoder 
and to the decoder. 

2) Let y i = (sj,Vj) and y = (y x , y 2 , . . . , y N ). Compress 
the vector y using the l-th polar code with up, = a;. 
This results in a vector u and a vector x = uGf™. 

3) Finally s = x ® g. 
Decoding function, a = D;(s): 

1) Let x = s g. 

2) a = ^x (Gf n ) ^ where (z) Fj denotes the elements 
of the vector z in the set -FJ. 

Note that the information is embedded within the set F[. 
Hence, when considered as a WOM code, our code has rate 
R l = \F l \/N = (N-\Ff\)/N = l-R\. 

For the sake of the proof we slightly modify the coding 
scheme as follows: 



(Ml) The definition of the Z-th channel is modified such that 
in ( fT2l ) we use e; — £ instead of e/ where C > will be 
chosen arbitrarily small. 

(M2) The encoder sets \ip l = a/ © gj instead of up, = a;, 
where g z ' is \Fi\ dimensional uniformly distributed bi- 
nary (dither) vector known both at the encoder and 
decoder. In this way, the assumption that u.p [ is uni- 
formly distributed holds. Similarly, the decoder modifies 
its operation to a = ( x (Gf n ) j © gj. 

V / Fi 

(M3) We assume a random permutation of the input vector 
y prior to quantization in each polar code. These ran- 
dom permutations are known both at the encoder and 
decoder. More precisely, in step 2 the encoder applies 
the permutation on y to produce y. Then it compresses 
y and obtains the codeword x. Finally it applies the 
inverse permutation on x to produce x and proceeds to 
step 3. The decoder, in the end of step 1, permutes x to 
produce x and uses x instead of x in step 2. 
(M4) Denote the Hamming weight of the WOM state s; after 
Z writes by T/ = wh(si). Also denote the binomial 
distribution with TV trials and success probability 1 — a 
by B(N, 1 - a), such that T - B(N, 1 - a) if for 
k = 0,1,..., N, Pr(T = k) = (1)(1 - a) k a N - k . 
After the Z-th write we pick a number k from the 
distribution B{N, 1 — ai). If wh(si) < k then we flip 
k — wh(si) elements in s; from to 1. 
Theorem 2: Consider an arbitrary information sequence 
ai, . . . , a t with rates R\, R2, ■ . ■ , Rt that are inside the ca- 
pacity region (HJ of the binary WOM. For any < /? < 1/2 
and N sufficiently large, the coding scheme described above 
can be used to write this sequence reliably over the WOM 
w.p. at least 1 — 2 in encoding and decoding complexities 
0(N log AT). 

To prove the theorem we need the following lemmcQ. 
Consider an i.i.d. source (S, V) with the following probability 
distribution, 



P((S,V) = (s,v)) 



(l-aj_i)/2 ifs = l,u = 

a;_i/2 ifs = 0,v = 

a;_i/2 ifs = 0,u = l 

(l-aj_i)/2 if* = !,« = ! 



(14) 

Note that this source has the marginal distribution of the output 
of the Z-th channel defined by (fTTli-(fT2l under a symmetric 
input distribution. 

Lemma 1: Consider a polar code designed for the Z-th 
channel defined by (fTTli-(fT2l as described above. The code 
has rate R[ defined in (fT3l l, a frozen set of sub-channels, Fi, 
and some frozen vector Up, which is uniformly distributed 
over all \Fi\ dimensional binary vectors. The code is used to 
encode a random vector (S, V) drawn by i.i.d. sampling from 
the distribution ( TBl i using the SC encoder. Denote by X the 
encoded codeword. Then for any 5 > 0, < f3 < 1/2 and N 

'This Lemma is formulated for the original channel with parameter e;, and 
not for the (Ml) modified channel with parameter e; — 



sufficiently large, the following holds w.p. at least 1 — 2 Nf> , 

\{k : S k = and X k © V k = 1}\ < (e,a M + 5)N 
{k : S k = 1 and X k © V k = 1} = 

The proof follows from Theorem [T] that asserts, for N (i.e., 
n) large enough, that 

(X(U),(S,V))e4$°(X,(S,V)) 



-TV 3 



. The details are omitted due to space 



w.p. at least 1 
limitations. 

We proceed to the proof of Theorem [2] We denote by 
Si, S, V, G, X and T; the random variables corresponding to 
s;,s,v, g,x and 7;. 

Proof of Theorem |2j Note that we only need to prove 
successful encoding since the WOM is noiseless. 

Recall that T; = w H (Si). Suppose that r ; _i - B(N, 1 - 
otl-i). Our first claim is that under this assumption, for £ > 
sufficiently small and N sufficiently large, w.p. at least 1 — 
2~ N , the encoding will be successful and Ti/N < 1 — ai — £. 
Considering step 1 of the encoding we see that (S,V) can 
be considered as i.i.d. sampling of the source (S, V) defined 
in ( TBi i (since G is a BSS and using (M3) above). Hence, by 
Lemma Q] (with 5/2 instead of 5) and (Ml), the compression 
of this vector in step 2 satisfies the following for any 5 > 
and N sufficiently large w.p. at least 1 — 2~ N . 

1) If S k = 1 then X k = V k = S k (3G k = G k © 1. 

2) For at most [(e; — C)^-i + 8/2] N components k we 
have S k = and X k = V k ®l = S k ®G k ® 1 = G k ® 1. 

Hence in step 3 of the encoding, if S k = 1 then S k = X k © 
G k = 1 (i.e. the WOM constraints are satisfied). In addition 
there are at most [(ej — C) a i-i + S/2]N components k for 



which S k = and S k = 1. Therefore, w.p. at least 1 — 2 
the vectors S and S satisfy the WOM constraints and, 



w H (S) < [1 
= fl 



ai-i + (q - 



O a i-i 
-S]N 



5}N 



(15) 



(in the first inequality we have used the fact that for n 
sufficiently large, Tj-i < (1 — a;_i + 5/2) N w.p. at least 
1 — e~ Nt for some e > independent of N). Setting 
£ = C a (-i — 6 yields our first claim. 

From ( fl5l l we know that k in (M4) will indeed satisfy the 
required condition w.p. at least 1 — 2~ N . The proof of the 
theorem now follows by using induction on I to conclude that 
(w.p. at least 1 — 2~ N ) the Z-th encoding is successful and 
Ti ~ B(N, 1 — a{). The complexity claim is due to the results 
in 0. □ 

We note the following. The test channel in the first write 
is actually a BSC (since ui-\ = 1 in Figure [TJ. Similarly, 
it is easy to verify that in the last (t) write we can merge 
together the source symbols (0,0) and (0, 1) thus obtaining a 
test channel which is a binary erasure channel (BEC). 

In practice (e.g., in flash memory), the dither g can be 
determined from the address of the data word (i.e., the address 
is used as a seed to a random number generator). 



In the rare event of an encoding error, the encoder may re- 
encode using another dither value. The decoder can realize the 
correct dither value, either by direct communication (similarly 
to the assumption that the generation number is known), or by 
switching to the next dither value upon detecting (e.g., using 
CRC) a decoding failure. 

V. Simulation Results 

To demonstrate the performance of our coding scheme 
for finite length codes we performed experiments with po- 
lar WOM codes with n = 10, 12, 14, 16. Each polar code 
was constructed using the test channel in Figure Q] with the 
appropriate parameters e/ and cti-x- To learn the frozen set 
Fi of each code we used the Monte-Carlo approach that was 
described in lfl3l (which is a variant of the method proposed 
by Arikan Q). Figure |2] summarizes our results with t = 2 
write WOMs designed to maximize the average rate. Using 
the results in we set ei = 1/3. Hence a.\ = 2/3. Each 
point in each graph was determined by averaging the results 
of 1000 Monte-Carlo experiments. Figure |2]-left shows the 
success rate of the first write as a function of the rate loss 
ARi compared to the optimum (Ri = h(l/3) = 0.9183) for 
each value of n. Here success is defined as wh(si)/N < e±. 
Figure |2}right shows the success rate of the second write as 
a function of the rate loss AR2 compared to the optimum 
(i?2 = 2/3) for each value of n. Here success is defined 
as successful encoding (and decoding) of both data items 
under the WOM constraints. Each experiment in the second 
write was performed by using a first write with rate loss of 
ARx = 0.01. For n = 10, 12, 14, AR 1 should be higher, but 
this is compensated by using higher values of Ai?2- As an 
alternative we could have used a higher rate loss ARi for 
n = 10, 12, 14, in which case AR2 decreases. In terms of 
total rate loss both options yielded similar results. We see that 
for 7i = 16 the total rate loss required for successful (with 
very high probability) first and second write is about 0.08. 



FIRST WRITE SECOND WRITE 




0.01 0.02 0.03 0.04 0.05 0.1 0.15 0.2 

AR1 AR2 



Fig. 2. Left: The performance curves of the first write to the WOM. Right: 
The performance curves of the second write to the WOM. 

We have also experimented with a t = 3 write WOM. We set 
e x = 1/4, e 2 = 1/3 and e 3 = 1/2 (ax = 3/4 and a 2 = 1/2) 
to maximize the average rate in accordance with J3|. To find 
the frozen set Fi of each code we used density evolution lfl4l . 
lfT3l . The maximum average rate is obtained for i?i = .8113, 
i?2 = .6887 and i?3 = 1/2. The actual information rates for 



a polar code with n = 16 were R\ = .7913, Ri = .6687 
and i?3 = .34. For M = 1000 read/write experiments all 
information triples were encoded (and decoded) successfully. 

VI. Discussion 

One possible generalization of our work is to the case 
of a noisy WOM. In this case one might wish to consider 
communications over a Gelfand-Pinsker (GP) channel and use 
the results in (8). However, these results may not be suitable 
for WOM codes, as they require a two-stage writing process 
where the second write does not satisfy the power constraint. 

Other codes and decoding methods may be considered in 
our WOM scheme, for example low-density generating-matrix 
(LDGM) codes that were shown useful in the past for lossy 
compression. Since iterative decoding usually yields better 
results compared to SC decoding of polar codes Q iTPJl . it 
may be possible to improve the performance of our SC encoder 
by using iterative encoding combined with decimation. 
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